NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamically monitoring crowd-worker's reliability with interval-valued labels

https://doi.org/10.54941/ahfe1003270

Hu, Chenyi; Spurling, Makenzie (January 2023, AHFE International)

Crowdsourcing has rapidly become a computing paradigm in machine learning and artificial intelligence. In crowdsourcing, multiple labels are collected from crowd-workers on an instance usually through the Internet. These labels are then aggregated as a single label to match the ground truth of the instance. Due to its open nature, human workers in crowdsourcing usually come with various levels of knowledge and socio-economic backgrounds. Effectively handling such human factors has been a focus in the study and applications of crowdsourcing. For example, Bi et al studied the impacts of worker's dedication, expertise, judgment, and task difficulty (Bi et al 2014). Qiu et al offered methods for selecting workers based on behavior prediction (Qiu et al 2016). Barbosa and Chen suggested rehumanizing crowdsourcing to deal with human biases (Barbosa 2019). Checco et al studied adversarial attacks on crowdsourcing for quality control (Checco et al 2020). There are many more related works available in literature. In contrast to commonly used binary-valued labels, interval-valued labels (IVLs) have been introduced very recently (Hu et al 2021). Applying statistical and probabilistic properties of interval-valued datasets, Spurling et al quantitatively defined worker's reliability in four measures: correctness, confidence, stability, and predictability (Spurling et al 2021). Calculating these measures, except correctness, does not require the ground truth of each instance but only worker’s IVLs. Applying these quantified reliability measures, people have significantly improved the overall quality of crowdsourcing (Spurling et al 2022). However, in real world applications, the reliability of a worker may vary from time to time rather than a constant. It is necessary to monitor worker’s reliability dynamically. Because a worker j labels instances sequentially, we treat j’s IVLs as an interval-valued time series in our approach. Assuming j’s reliability relies on the IVLs within a time window only, we calculate j’s reliability measures with the IVLs in the current time window. Moving the time window forward with our proposed practical strategies, we can monitor j’s reliability dynamically. Furthermore, the four reliability measures derived from IVLs are time varying too. With regression analysis, we can separate each reliability measure as an explainable trend and possible errors. To validate our approaches, we use four real world benchmark datasets in our computational experiments. Here are the main findings. The reliability weighted interval majority voting (WIMV) and weighted preferred matching probability (WPMP) schemes consistently overperform the base schemes in terms of much higher accuracy, precision, recall, and F1-score. Note: the base schemes are majority voting (MV), interval majority voting (IMV), and preferred matching probability (PMP). Through monitoring worker’s reliability, our computational experiments have successfully identified possible attackers. By removing identified attackers, we have ensured the quality. We have also examined the impact of window size selection. It is necessary to monitor worker’s reliability dynamically, and our computational results evident the potential success of our approaches.This work is partially supported by the US National Science Foundation through the grant award NSF/OIA-1946391.
more » « less
The Removal of Irrelevant Human Factors in a Multi-Review Corpus through Text Filtering

https://doi.org/10.54941/ahfe1003766

Moody, Aaron; Spurling, Makenzie; Hu, Chenyi (January 2023, Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing)

Generating a high-quality explainable summary of a multi-review corpus can help people save time in reading the reviews. With natural language processing and text clustering, people can generate both abstractive and extractive summaries on a corpus containing up to 967 product reviews (Moody et al. 2022). However, the overall quality of the summaries needs further improvement. Noticing that online reviews in the corpus come from a diverse population, we take an approach of removing irrelevant human factors through pre-processing. Apply available pre-trained models together with reference based and reference free metrics, we filter out noise in each review automatically prior to summary generation. Our computational experiments evident that one may significantly improve the overall quality of an explainable summary from such a pre-processed corpus than from the original one. It is suggested of applying available high-quality pre-trained tools to filter noises rather than start from scratch. Although this work is on the specific multi-review corpus, the methods and conclusions should be helpful for generating summaries for other multi-review corpora.
more » « less
Estimating crowd-worker's reliability with interval-valued labels to improve the quality of crowdsourced work

https://doi.org/10.1109/SSCI50451.2021.9660043

Spurling, Makenzie; Hu, Chenyi; Zhan, Huixin; Sheng, Victor S. (December 2021, 2021 IEEE Symposium Series on Computational Intelligence (SSCI))

With inputs from human crowds, usually through the Internet, crowdsourcing has become a promising methodology in AI and machine learning for applications that require human knowledge. Researchers have recently proposed interval-valued labels (IVLs), instead of commonly used binary-valued ones, to manage uncertainty in crowdsourcing [19]. However, that work has not yet taken the crowd worker’s reliability into consideration. Crowd workers usually come with various social and economic backgrounds, and have different levels of reliability. To further improve the overall quality of crowdsourcing with IVLs, this work presents practical methods that quantitatively estimate worker’s reliability in terms of his/her correctness, confidence, stability, and predictability from his/her IVLs. With worker’s reliability, this paper proposes two learning schemes: weighted interval majority voting (WIMV) and weighted preferred matching probability (WPMP). Computational experiments on sample datasets demonstrate that both WIMV and WPMP can significantly improve learning results in terms of higher precision, accuracy, and F1-score than other methods.
more » « less
Full Text Available

Search for: All records